Getting and Plotting stock data

by Hrant Davtyan

This Jupyter notebook describes the steps necessary to take to get and plot stock data using Python. There is a standout package in Python called pandas-datareader, which provides simple interface for getting data from Google finance, Yahoo! finance, World Bank etc.

  • to install pandas-datareader open a new command prompt window (black guy) and type the following command: '''pip install pandas-datareader'''
  • to learn more about pandas-datareader and the websites it can get data from, check its official documentation

To start using pandas-datareader (assuming it is already installed in our computer), we fist need to import the library. As it has quite long name, we will import it as web (shorter name).


In [1]:
import pandas_datareader.data as web

The DataReader function from above imported library provides the data on stocks available in Google/Yahoo! finance. Hence, the function takes to mandatory arguments: name of the stock (name is a text/string, so it shoulg be in quotes) and name of the website (which is again string, so again we should use quotes).

Let's get the IBM stock data from Google Finance.


In [2]:
data = web.DataReader("IBM","google")

So now, the IBM stock data is already downloaded and saved in our variable called data. To view the first 5 observations/raws of the data, the function head() can be used:


In [3]:
data.head()


Out[3]:
Open High Low Close Volume
Date
2010-01-04 131.18 132.97 130.85 132.45 6155846
2010-01-05 131.68 131.85 130.10 130.85 6842471
2010-01-06 130.68 131.49 129.81 130.00 5605290
2010-01-07 129.87 130.25 128.91 129.55 5840569
2010-01-08 129.07 130.92 129.05 130.85 4197105

Please note, that the function head() above gives the very first 5 observations by default (whenever we do not explicitly mention anything else inside brackets). If one is interested in viewing the very first 10 observations, that's also doable. THe only necessary step to take is to give 10 as an argument to our head() function:


In [4]:
data.head(10)


Out[4]:
Open High Low Close Volume
Date
2010-01-04 131.18 132.97 130.85 132.45 6155846
2010-01-05 131.68 131.85 130.10 130.85 6842471
2010-01-06 130.68 131.49 129.81 130.00 5605290
2010-01-07 129.87 130.25 128.91 129.55 5840569
2010-01-08 129.07 130.92 129.05 130.85 4197105
2010-01-11 131.06 131.06 128.67 129.48 5731177
2010-01-12 129.03 131.33 129.00 130.51 8083354
2010-01-13 130.39 131.12 129.16 130.23 6458302
2010-01-14 130.55 132.71 129.91 132.31 7114544
2010-01-15 132.03 132.89 131.09 131.78 8502320

Similarly, one can view the very last 5 or 10 observations by just using the tail() function, instead of head():


In [5]:
data.tail()


Out[5]:
Open High Low Close Volume
Date
2017-05-24 152.21 152.76 151.23 152.51 3732399
2017-05-25 153.25 153.73 152.95 153.20 2582815
2017-05-26 152.85 153.00 152.06 152.49 2443507
2017-05-30 151.95 152.67 151.59 151.73 3666032
2017-05-31 152.03 152.80 151.65 152.63 3543404

The type of resulted datasets (i.e. the type of the variable called data) is knows as DataFrame as it represents the data inside a frame. We could also learn about that by using the type() function.


In [6]:
type(data)


Out[6]:
pandas.core.frame.DataFrame

The DataFrames are very user friendly types wo work with. Many operatinos on DataFrames are similar to this we did with lists. For example, if one is interested in choosing only one column of the DataFrame, s/he just needs to put square brackets and name of the chosen column side (note, name is a string, so it should be inside quotes):


In [7]:
data["Open"]


Out[7]:
Date
2010-01-04    131.18
2010-01-05    131.68
2010-01-06    130.68
2010-01-07    129.87
2010-01-08    129.07
2010-01-11    131.06
2010-01-12    129.03
2010-01-13    130.39
2010-01-14    130.55
2010-01-15    132.03
2010-01-19    131.63
2010-01-20    130.46
2010-01-21    130.47
2010-01-22    128.67
2010-01-25    126.33
2010-01-26    125.92
2010-01-27    125.82
2010-01-28    127.03
2010-01-29    124.32
2010-02-01    123.23
2010-02-02    124.79
2010-02-03    125.16
2010-02-04    125.19
2010-02-05    123.04
2010-02-08    123.15
2010-02-09    122.65
2010-02-10    122.94
2010-02-11    122.58
2010-02-12    123.01
2010-02-16    124.91
               ...  
2017-04-19    161.76
2017-04-20    161.32
2017-04-21    162.05
2017-04-24    161.29
2017-04-25    161.78
2017-04-26    160.53
2017-04-27    160.29
2017-04-28    160.50
2017-05-01    160.05
2017-05-02    159.44
2017-05-03    158.74
2017-05-04    158.89
2017-05-05    153.52
2017-05-08    152.80
2017-05-09    152.60
2017-05-10    151.65
2017-05-11    151.05
2017-05-12    150.30
2017-05-15    150.62
2017-05-16    151.66
2017-05-17    153.30
2017-05-18    150.86
2017-05-19    151.01
2017-05-22    152.10
2017-05-23    152.57
2017-05-24    152.21
2017-05-25    153.25
2017-05-26    152.85
2017-05-30    151.95
2017-05-31    152.03
Name: Open, dtype: float64

Similarly, one can choose to show only selected rows from the DataFrame (note, this operation work for Year and Month arguments only):


In [8]:
data["2015-05"]


Out[8]:
Open High Low Close Volume
Date
2015-05-01 173.20 174.00 172.42 173.67 3312052
2015-05-04 174.47 176.30 173.70 173.97 4027978
2015-05-05 173.51 174.23 171.96 173.08 3593620
2015-05-06 172.90 174.05 168.86 170.05 3612606
2015-05-07 169.63 171.98 169.04 170.99 2472687
2015-05-08 172.94 173.33 172.24 172.68 3092602
2015-05-11 172.65 172.99 170.86 171.12 2661030
2015-05-12 170.55 171.49 168.84 170.55 2962412
2015-05-13 171.24 172.74 170.75 172.28 2457521
2015-05-14 173.50 174.40 173.22 174.05 2439070
2015-05-15 173.91 174.41 172.60 173.26 2916579
2015-05-18 173.44 173.49 172.30 173.06 1970630
2015-05-19 172.97 173.75 171.93 173.48 2523002
2015-05-20 173.33 174.44 172.46 173.76 2300693
2015-05-21 173.32 174.14 173.04 173.34 2295596
2015-05-22 173.04 173.39 172.19 172.22 2849692
2015-05-26 172.11 172.12 169.13 170.13 3854170
2015-05-27 171.16 172.48 170.49 172.00 2764378
2015-05-28 171.45 171.84 170.66 171.71 1731372
2015-05-29 171.35 171.35 169.65 169.65 4091981

A specific date range is also acceptable as an input:


In [9]:
data["2015-05":"2016-05"]


Out[9]:
Open High Low Close Volume
Date
2015-05-01 173.20 174.00 172.42 173.67 3312052
2015-05-04 174.47 176.30 173.70 173.97 4027978
2015-05-05 173.51 174.23 171.96 173.08 3593620
2015-05-06 172.90 174.05 168.86 170.05 3612606
2015-05-07 169.63 171.98 169.04 170.99 2472687
2015-05-08 172.94 173.33 172.24 172.68 3092602
2015-05-11 172.65 172.99 170.86 171.12 2661030
2015-05-12 170.55 171.49 168.84 170.55 2962412
2015-05-13 171.24 172.74 170.75 172.28 2457521
2015-05-14 173.50 174.40 173.22 174.05 2439070
2015-05-15 173.91 174.41 172.60 173.26 2916579
2015-05-18 173.44 173.49 172.30 173.06 1970630
2015-05-19 172.97 173.75 171.93 173.48 2523002
2015-05-20 173.33 174.44 172.46 173.76 2300693
2015-05-21 173.32 174.14 173.04 173.34 2295596
2015-05-22 173.04 173.39 172.19 172.22 2849692
2015-05-26 172.11 172.12 169.13 170.13 3854170
2015-05-27 171.16 172.48 170.49 172.00 2764378
2015-05-28 171.45 171.84 170.66 171.71 1731372
2015-05-29 171.35 171.35 169.65 169.65 4091981
2015-06-01 170.21 171.04 169.03 170.18 2985479
2015-06-02 169.66 170.45 168.43 169.65 2571862
2015-06-03 170.50 171.56 169.63 169.92 2131031
2015-06-04 169.53 170.60 167.93 168.38 3079334
2015-06-05 168.25 168.91 167.20 167.40 3100505
2015-06-08 167.17 167.28 165.02 165.34 3758726
2015-06-09 165.34 166.02 163.37 165.68 3395901
2015-06-10 166.49 169.39 166.06 168.92 4680545
2015-06-11 169.26 170.44 168.54 168.78 3464013
2015-06-12 168.23 168.30 166.69 166.99 3065085
... ... ... ... ... ...
2016-04-19 146.47 146.95 142.61 144.00 13149148
2016-04-20 144.24 147.20 144.00 146.11 6721442
2016-04-21 146.58 150.12 146.46 149.30 5992604
2016-04-22 149.44 151.00 147.50 148.50 5190627
2016-04-25 148.16 148.90 147.11 148.81 2845511
2016-04-26 148.65 149.79 147.90 149.08 2978004
2016-04-27 149.35 150.78 148.97 150.47 3086611
2016-04-28 149.75 150.18 146.72 147.07 3771853
2016-04-29 146.49 147.34 144.19 145.94 4217744
2016-05-02 146.56 147.00 144.43 145.27 3499020
2016-05-03 144.65 144.90 142.90 144.13 3558829
2016-05-04 143.36 145.00 143.31 144.25 2575776
2016-05-05 145.95 147.30 145.45 146.47 6492015
2016-05-06 144.86 147.97 144.47 147.29 4882514
2016-05-09 147.70 148.20 147.01 147.34 4298800
2016-05-10 148.24 150.04 147.74 149.97 3982554
2016-05-11 149.71 151.09 148.74 148.95 3075213
2016-05-12 149.21 149.39 147.11 148.84 3247675
2016-05-13 148.79 149.86 147.42 147.72 2372098
2016-05-16 147.65 149.99 147.44 149.46 3061873
2016-05-17 149.21 149.50 147.29 148.00 3489779
2016-05-18 147.99 148.52 146.36 147.34 2482097
2016-05-19 146.48 146.93 143.96 144.93 3618752
2016-05-20 145.71 147.51 145.55 147.25 3576766
2016-05-23 147.61 147.95 146.66 146.77 2088554
2016-05-24 146.88 148.75 146.88 148.31 2827106
2016-05-25 148.93 152.09 148.50 151.69 4347009
2016-05-26 151.55 152.51 151.05 152.44 3042788
2016-05-27 152.35 152.93 152.15 152.84 2456289
2016-05-31 152.56 153.81 152.27 153.74 5836645

273 rows × 5 columns

So now let's move forward and plot the data we received. For ploting purposes, the matplotlib.pyplot library is usually used in python. Let's import it first. As it has quite a long name, we will call it plt inside our Jupyter notebook.


In [10]:
import matplotlib.pyplot as plt

Let's first make some sample plot, and then move to our dataset. To make a plot and show it one needs to use two functions from plt: plt.plot() and plt.show().


In [11]:
# a sample plot of bisector line
plt.plot([1,2,3,4],[1,2,3,4])
plt.show()


Please note, that without the plt.show() function the plot would be generated but shown. This is sometimes useful when you want to generate a plot and save it, instead of showing it. However, if one wants to always show the plotted graphs, s/he could just put the following arguments when importing the matplotlib.pyplot library:

''' %matplotlib inline '''

This arguments tells Jupyter notebook to show inline (inside the notebook) all the generated plots. So if one has that argument, there is no need for typing plt.show() every single time.

We do not have it, so we have to show all the plots separately. Let's now plot the highest price of IBM stock.


In [12]:
plt.plot(data["High"])
plt.show()


Let's get the Apple stock data also, and plot it's daily highest price together with IBM (to compare).


In [13]:
data_apple = web.DataReader("AAPL",'google')

Now we have also Apple stock data. To make two plots on the same graph one just needs to have two plt.plot() functions followed by a single plt.show() function in the end:


In [14]:
plt.plot(data["High"])
plt.plot(data_apple["High"])
plt.show()


If you are interested in customization of your plot (colors, apperance etc.) you may check the official tutorial, which provides some nice features available for plt.plot() function (and not only).